A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES: 
APPLICATIONS TO SYNONYM EXTRACTION AND WEB 
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Abstract. We introduce a concept of similarity between vertices of directed graphs. Let Ga and 
Gb be two directed graphs with respectively tia and ub vertices. We define a rtg X ua similarity 
matrix S whose real entry Sij expresses how similar vertex j (in Ga) is to vertex i (in Gb) '■ we say 
that Sij is their similarity score. The similarity matrix can be obtained as the limit of the normalized 
even iterates of S{k + 1) = BS{k)A'^ + B'^S{k)A where A and B are adjacency matrices of the graphs 
and S{0) is a matrix whose entries are all equal to one. In the special case where Ga = Gb = G, 
the matrix S is square and the score Sij is the similarity score between the vertices i and j of G. 
We point out that Kleinberg's "hub and authority" method to identify web-pages relevant to a given 
query can be viewed as a special case of our definition in the case where one of the graphs has 
two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our 
similarity scores are given by the components of a dominant eigenvector of a non-negative matrix. 
Potential applications of our similarity concept are numerous. We illustrate an application for the 
automatic extraction of synonyms in a monolingual dictionary. 
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1. Generalizing hubs and authorities. Efficient web search engines such as 
Google are often based on the idea of characterizing the most important vertices 
in a graph representing the connections or Unks between pages on the web. One 
such method, proposed by Kleinberg ^Hj, identifies in a set of pages relevant to a 
query search the subset of pages that are good hubs or the subset of pages that are 
good authorities. For example, for the query "university" , the home-pages of Oxford, 
Harvard and other universities are good authorities, whereas web pages that point to 
these home-pages are good hubs. Good hubs are pages that point to good authorities, 
and good authorities are pages that are pointed to by good hubs. From these implicit 
relations, Kleinberg derives an iterative method that assigns an "authority score" and 
a "hub score" to every vertex of a given graph. These scores can be obtained as the 
limit of a converging iterative process which we now describe. 

Let G — (y, E) be a graph with vertex set V and with edge set E and let hj and 
ttj be the hub and authority scores of vertex j. We let these scores be initialized by 
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some positive values and then update them simultaneously for all vertices according 
to the following mutually reinforcing relation : the hub score of vertex j is set equal 
to the sum of the authority scores of all vertices pointed to by j and, similarly, the 
authority score of vertex j is set equal to the sum of the hub scores of all vertices 
pointing to j : 

Let B be the matrix whose entry is equal to the number of edges between 
the vertices i and j in G (the adjacency matrix of G), and let h and o be the vectors 
of hub and authority scores. The above updating equations then take the simple form 



" h ' 
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which we denote in compact form by 

Xk+i = M Xk, A; = 0, 1, . . . 

where 
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Notice that the matrix M is symmetric and non- negative^. We are only interested in 
the relative scores and we will therefore consider the normalized vector sequence 

(1-1) Zo = 2^0 > 0, 2fc+l = II ,^5^^ , fc = 0, 1,... 

where || • ||2 is the Euclidean vector norm. Ideally, we would like to take the limit 
of the sequence Zk as a definition for the hub and authority scores. There are two 
difficulties with such a definition. 

A first difficulty is that the sequence z^ does not always converge. In fact, se- 
quences associated with non-negative matrices M with the above block structure 
almost never converge but rather oscillate between the limits 

Zeven= hm Z2k and Zodd= hm Z2k+i- 

k — >oo k — >oo 

We prove in Theorem |21 that this is true in general for symmetric non-negative ma- 
trices, and that either the sequence resulting from converges, or it doesn't and 
then the even and odd sub-sequences do converge. Let us consider both limits for the 
moment. 

The second difficulty is that the limit vectors ^euen and Zodd do in general depend 
on the initial vector zq and there is no apparently natural choice for zq- The set of 
all limit vectors obtained when starting from a positive initial vector is given by 

(zq), Zodd{zo) : zo > 0} 

^ A matrix or a vector Z will be said non- negative (positive) if all its components are non- negative 
(positive), we write Z ^ (Z > 0) to denote this. 
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and we would like to select one particular vector in that set. The vector Ze^en obtained 
for zg = 1 (we denote by 1 the vector, or matrix, whose entries are all equal to 1) 
has several interesting features that qualifies it as a good choice: it is particularly 
easy to compute, it possesses several nice properties (see in particular Section^, and 
it has the extremal property, proved in Theorem 13 of being the unique vector in Z 
of largest possible 1-norm (the 1-norm of a vector is the sum of all the magnitudes 
of its entries). Because of these features, we take the two sub- vectors of ZeveniX) ^ 
definitions for the hub and authority scores. In the case of the above matrix M, we 
have 



BB^ 
B^B 



and from this equality it follows that, if the dominant invariant subspaces associated 
with BB'^ and B'^B have dimension one, then the normalized hub and authority 
scores are simply given by the normalized dominant eigenvectors of BB^ and B^ B. 
This is the definition used in JS| for the authority and hub scores of the vertices of 
G. The arbitrary choice of zq = 1 made in |18j is shown here to have an extremal 
norm justification. Notice that when the invariant subspace has dimension one, then 
there is nothing particular about the starting vector 1 since any other positive vector 
zo would give the same result. 

We now generalize this construction. The authority score of vertex j of G can 
be thought of as a similarity score between vertex j of G and vertex authority of the 
graph 

hub !■ authority 

and, similarly, the hub score of vertex j of G can be seen as a similarity score between 
vertex j and vertex hub. The mutually reinforcing updating iteration used above can 
be generalized to graphs that are different from the hub-authority structure graph. 
The idea of this generalization is easier to grasp with an example; we illustrate it 
first on the path graph with three vertices and then provide a definition for arbitrary 
graphs. Let G be a graph with edge set E and adjacency matrix B and consider the 
structure graph 

1 — >2 — > 3. 

With each vertex j of G we now associate three scores Xn, Xi2 and Xi^; one for each 
vertex of the structure graph. Wc initialize these scores with some positive value and 
then update them according to the following mutually reinforcing relation 

Xi2 ^ J2j:{jA)eE^ii +J2j:{i,j)eE^i3 
Xi3 ^ J2j:{jA)i£E ^i"^ 

or, in matrix form (wc denote by Xj the column vector with entries Xij), 



Xl 
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Fig. 1.1. A graph and its similarity matrix with the structure graph 1 > 2 > 3. The 

similarity score of vertex 3 with vertex 2 of the structure graph is egual to 0.5540. 

which we again denote by Xk+i = Mxk- The situation is now identical to that of 
the previous example and all convergence arguments given there apply here as well. 
The matrix M is symmetric and non-negative, the normalized even and odd iterates 
converge and the limit Ze^e„(l) is among all possible limits the unique vector with 
largest possible 1-norm. We take the three components of this extremal limit Zet,en(l) 
as definition for the similarity scores si, S2 and S3 and define the similarity matrix by 
S = [si S2 53]. A numerical example of such a similarity matrix is shown in Figure 
^ Note that we shall prove in Theorem |3 that the score S2 can be obtained more 
directly from B by computing the dominating eigenvector of the matrix BB^ + B^ B. 

We now come to a description of the general case. Assume that we have two 
directed graphs Ga and Gb with ha and ub vertices and edge sets Ea and Eb- We 
think of Ga as a structure graph that plays the role of the graphs hub — > authority 
and 1 — > 2 — > 3 in the above examples. We consider real scores Xij for i = 1, . . . , ub 
and j = 1,.. .,nA and simultaneously update all scores according to the following 
updating equations 



Xij ^ ^ ^ Xfs 4~ / ^ Xrs • 

r:{r,i)eEB, s:{s,j)eEA r:{i,r)eEB, s:{],s)eEA 

This equation can be given an interpretation in terms of the product graph of Ga 
and Gb- The product graph of Ga and Gb is a graph that has ua-ub vertices and 
that has an edge between vertices (ji,ji) and (12,^2) if there is an edge between ii 
and 12 in Ga and there is an edge between ji and j2 in Gb ■ The updating equation 
()1.1|) is then equivalent to replacing the values of all vertices of the product graph by 
the values of the outgoing and incoming vertices in the graph. 

Equation (|l.l|l can also be written in more compact matrix form. Let X}. be the 
Ub x riA matrix of entries Xij at iteration k. Then the updating equations take the 
simple form 

(1.2) Xk+i^ BXkA^ + B^XkA, fc-0,1,... 

where A and B are the adjacency matrices of Ga and Gb- We prove in Section O 
that, as for the above examples, the normalized even and odd iterates of this updating 
equation converge, and that the limit Ze„e„(l) is among all possible limits the only 
one with largest 1-norm. We take this limit as definition of the similarity matrix. An 
example of two graphs and their similarity matrix is shown in Figure H"^ 



SIMILARITY IN GRAPHS 



5 





■ 0.2720 


0.2840 


0.2700 


0.1440 ' 


0.1400 


0.1390 


0.0670 


0.1390 


0.2700 


0.3120 


0.2560 


0.1650 


0.2300 


0.2440 


0.2870 


0.0720 


0.0660 


0.0720 


0.1030 





. 0.2540 


0.2480 


0.2340 


0.1460 . 



Fig. 1.2. Two graphs Ga,Gb omd their similarity matrix. The vertex of Ga which is most 
similar to vertex 5 in Gg is vertex 3. 



It is interesting to note that in the database hterature similar ideas have been 
proposed [201) EI- The apphcations there are information retrieval in large databases 
with which a particular graph structure can be associated. The ideas presented in 
these conference papers are obviously linked to those of the present paper, but in 
order to guarantee convergence of the proposed iteration to a unique fixed point, 
the iteration has to be slightly modified using e.g. certain weighting coefficients. 
Therefore, the hub and authority score of Kleinberg is not a special case of their 
definition. We thank the authors of for drawing our attention to those references. 

The rest of this paper is organized as follows. In Section 13 we describe some 
standard Perron- Frobenius results for non-negative matrices that are useful in the rest 
of the paper. In Section|2| we give a precise definition of the similarity matrix together 
with different alternative definitions. The definition immediately translates into an 
approximation algorithm and we discuss in that section some complexity aspects of 
the algorithm. In Section ^ we describe similarity matrices for the situation where 
one of the two graphs is a path graph of length 2 or 3. In Section [S] we consider 
the special case Ga = Gb = G for which the score is the similarity between the 
vertices i and j in a single graph G. Section |B1 deals with graphs whose similarity 
matrix have rank one. We prove there that if one of the graphs is regular or if one 
of the graphs is undirected, then the similarity matrix has rank one. Regular graphs 
are graphs whose vertices have the same in-degrees, and the same out-degrees; cycle 
graphs, for example, are regular. In a final section we report results obtained for the 
automatic synonym extraction in a dictionary by using the central score of a graph. 
A short version of this paper appears as a conference contribution in 6 . 

2. Graphs and non-negative matrices. With any directed graph G = (V, E) 
one can an associate a non-negative matrix via an indexation of its vertices. The 
so-called adjacency matrix of G is the matrix B S M"^" whose entry 6^ equals the 
number of edges from vertex i to vertex j. Let B be the adjacency matrix of some 
graph G; the entry {B^)ij is equal to the number of paths of length k from vertex i 
to vertex j. From this it follows that a graph is strongly connected if and only if for 
every pair of indices i and j there is an integer k such that {B*')ij > 0. Matrices that 
satisfy this property are said to be irreducible. 

In the sequel, we shall need the notion of orthogonal projection on vector sub- 
spaces. Let V be a linear subspace of M" and let v S K". The orthogonal projection of 
w on V is the vector in V with smallest Euclidean distance to f . A matrix representa- 
tion of this projection can be obtained as follows. Let {vi, . . . , Vm} be an orthonormal 
basis for V and arrange the column vectors Vi in a matrix V . The projection of v on 
V is then given by Ilv = VV'^v and the matrix 11 = VV'^ is the orthogonal projector 
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on V. Projectors have the property that iP — H. 

The Perron-Frobenius theory estabhshes interesting properties about the 
eigenvectors and eigenvalues for non-negative matrices. Let the largest magnitude 
of the eigenvalues of the matrix AI (the spectral radius of M) be denoted by p{M). 
According to the Perron-Frobenius Theorem, the spectral radius of a non-negative 
matrix M is an eigenvalue of M (called the Perron root), and there exists an asso- 
ciated non-negative vector x ^ {x 0) such that Mx = px (called the Perron 
vector). In the case of symmetric matrices, more specific results can be obtained. 

Theorem 1. Let M be a symmetric non-negative matrix of spectral radius p. 
Then the algebraic and geometric multiplicity of the Perron root p are equal; there is 
a non-negative basis X ^ for the invariant subspace associated with the Perron root; 
and the elements of the orthogonal projector 11 on the vector space associated with the 
Perron root of M are all non-negative. 

Proof. We use the facts that any symmetric non-negative matrix M can be per- 
muted to a block-diagonal matrix with irreducible blocks Mi on the diagonal |14l 1!^. 
and that the algebraic multiplicity of the Perron root of an irreducible non-negative 
matrix is equal to one. From these combined facts it follows that the algebraic and 
geometric multiplicities of the Perron root p of M are equal. Moreover, the corre- 
sponding invariant subspace of M is obtained from the normalized Perron vectors of 
the Mi blocks, appropriately padded with zeros. The basis X one obtains that way 
is then non-negative and orthonormal. □ 

The next theorem will be used to justify our definition of similarity matrix be- 
tween two graphs. The result describes the limit vectors of sequences associated with 
symmetric non-negative linear transformations. 

Theorem 2. Let M be a symmetric non-negative matrix of spectral radius p. 
Let zq > and consider the sequence 

Zk+i = Mzk/\\Mzkh, k = 0,... 

Two convergence cases can occur depending on whether or not —p is an eigenvalue of 
M. When ~p is not an eigenvalue of M , then the sequence z^. simply converges to 
nzo/||nzo||2, where H is the orthogonal projector on the invariant subspace associated 
with the Perron root p. When —p is an eigenvalue of M , then the subsequences Z2k 
and Z2k+i converge to the limits 

Zeven{zo) ^ hm Z2k = — j— and Zodd[ZQ) = hm Z2k+i 



fe^oo ||nZo||2 fe^oo ||nAfzo||2 

where 11 is the orthogonal projector on the sums of the invariant subspaces associated 
with p and ~p. In both cases the set of all possible limits is given by 

Z = {Zeven{zo),Zodd{zo) : Zq > 0} ^ {nz/||nz||2 : Z > 0} 

and the vector Ze^,e„(l) is the unique vector of largest possible 1-norm in that set. 

Proof. We prove only the case where —p is an eigenvalue; the other case is a 
trivial modification. Let us denote the invariant subspaces of M corresponding to p, 
to —p and to the rest of the spectrum, respectively by Vp, V_p and V^. Assume that 
these spaces are non-trivial, and that we have orthonormal bases for them : 



MVp = pVp, MV-p - -pV^p, MV^ = VpM, 
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where is a square matrix (diagonal if is the basis of eigenvectors) with spectral 
radius /i strictly less than p. The eigenvalue decomposition can then be rewritten in 
block diagonal form : 



M = [Vp 



pi 



-pi 



= pVpVj - pV-pVl^ + V^M^V^. 
It then follows that 



= p^n + v^Mlv;^ 

where 11 := VpVj + V-pV'^i, is the orthogonal projector onto the invariant subspace 
Vp © V-p of M"^ corresponding to . We also have 



M 



2k 



2k^rT 



and since piM^) = p < p, it follows from multiplying this by zq and Mzq that 



and 



Z2k 



Z2k+1 



2 



o{p/py 



UMzo 
WliMzol 



+ 0{p/p) 



2k 



provided the initial vectors zq and Mzq have a non-zero component in the relevant 

subspaces, i.e. provided IIzo and IIAf zq are non-zero. But the Euclidean norm of 
these vectors equal z^Hzq and z^A'/HA'/zo since 11^ = 11. These norms are both 
non-zero since zq > and both H and AfllAf arc non-negative and non-zero. 

It follows from the non-negativity of M and the formula for Zey,.n{za) and Zodd{z{)) 
that both limits lie in {nz/||nz||2 : z > 0}. Let us now show that every element 
zo e {nz/||nz||2 : z > 0} can be obtained as Zeven{zo) for some zq > 0. Since the 
entries of 11 are non- negative, so are those of zq. This vector may however have some 
of its entries equal to zero. From zo we construct zo by adding e to all the zero entries 
of Zq. The vector zq — zo is clearly orthogonal to Vp V-p and will therefore vanish 
in the iteration of Af^. Thus we have Zei,,e„(zo) = zo for zq > 0, as requested. 

We now prove the last statement. The matrix 11 and all vectors are non-negative 
and = n, and so, 



ni 



and also 



liniii 



nzo 



lin^olh 



x/2o^n2zo' 



Applying the Schwarz inequality to Hzq and HI yields 



li^n^zol < ^z^n2zo.\/ini2T. 



with equality only when Hzq — AIIl for some A € C. But since IIzq and HI are both 
real non-negative, the proof easily follows. □ 
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3. Similarity between vertices in graphs. We now come to a formal def- 
inition of the similarity matrix of two directed graphs Ga and Gb- The updating 
equation for the similarity matrix is motivated in the introduction and is given by the 
linear mapping: 

(3.1) Xk+i^ BXkA^ + B^XkA, fc = 0,l,... 

where A and B are the adjacency matrices of Ga and Gb- In this updating equation, 
the entries of Xk+i depend linearly on those of Xk- We can make this dependance 
more explicit by using the matrix-to-vector operator that develops a matrix into a 
vector by taking its columns one by one. This operator, denoted vec, satisfies the 
elementary property \ec{CXD) = {D^ ®C) vec(X) in which ® denotes the Kronecker 
product (also denoted tensorial, direct or categorial product). For a proof of this 
property, see Lemma 4.3.1 in |15| . Applying this property to 1)3. l|l we immediately 
obtain 

(3.2) Xk+i = {A®B^A^®B'^)xk 

where Xk = vec(Xfe). This is the format used in the introduction. Combining this 
observation with Theorem |21 we deduce the following theorem. 



Theorem 3. Let Ga o,nd Gb be two graphs with adjacency matrices A and B, 
fix some initial positive matrix Zq > and define 

BZkA^ + B'^ZkA 
\\BZ,A^ + BTZ,A\\f ^-O'l'---- 

Then, the matrix subsequences Zi^ and Z2k+i converge to Zeven o,nd Zodd- Moreover, 
among all the matrices in the set 

{ZeveniZo), Zodd{Zo) Zq > 0} 

the matrix Ze„e„(l) is the unique matrix of largest 1-norm. 

In order to be consistent with the vector norm appearing in Theorem |5J the ma- 
trix norm ||.||f we use here is the square root of the sum of all squared entries (this 
norm is known as the Euclidean or Frobenius norm), and the 1-norm is the sum 
of the magnitudes of all its entries. One can also provide a definition of the set Z in 
terms of one of its extremal properties. 



Theorem 4. Let Ga o,nd Gb be two graphs with adjacency matrices A and B 
and consider the notation of Theorem\^ The set 

Z = {ZeveniZo), ZoddiZo) '■ Zq > 0} 

and the set of all positive matrices that maximize the expression 

WBXA^ + B^XA\\f 

are equal. Moreover, among all matrices in this set there is a unique matrix S whose 
1-norm is maximal. This matrix can be obtained as 

S = Hm Z2k 

k — ^+00 
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for Zq ^ 1. 

Proof. The above expression can also be written as ||_L(X)||2/||X||2 which is the 
induced 2-norm of the hnear mapping L defined by L{X) = BXA^ + B^XA. It 
is weh known 14 that each dominant eigenvector X of is a maximizer of this 
expression. It was shown above that S is the unique matrix of largest 1-norm in that 
set. □ 

We take the matrix S appearing in this Theorem as definition of similarity matrix 
between Ga and Gb- Notice that it follows from this definition that the similarity ma- 
trix between Gb and Ga is the transpose of the similarity matrix between Ga and Gb- 

A direct algorithmic transcription of the definition leads to an approximation al- 
gorithm for computing similarity matrices of graphs: 

1 . Set Zq = 1 . 

2. Iterate an even number of times 



Zk 



+1 



\BZkAT + BTZuA\\p 



and stop upon convergence of Z^ ■ 
4. Output S. 

This algorithm is a matrix analog to the classical power method (see [21) to com- 
pute a dominant eigenvector of a matrix. The complexity of this algorithm is easy to 
estimate. Let Ga,Gb be two graphs with nA,nB vertices and eA,eB edges, respec- 
tively. Then the products BZk and B^ Zk require less than 2nA-eB additions and 
multiplications each, while the subsequent products {BZk)A^ and [B^ Zk)A require 
less than 2nB-eA additions and multiplications each. The sum and the calculation of 
the Frobenius norm requires 2nA-nB additions and multiplications, while the scaling 
requires one division and ua-tib multiplications. Let us define 

aA'-^eA/riA, as := es/ns 

as the average number of non-zero elements per row of A and B, respectively, then 
the total complexity per iteration step is of the order of 4(ayi -I- aB)nAnB additions 
and multiplications. As was shown in Theorem[21 the convergence of the even iterates 
of the above recurrence is linear with ratio (/i/p)^. The number of floating point 
operations needed to compute S to e accuracy with the power method is therefore of 
the order of 



(aA+aB)loge 

^riAriB -J, ^ ■ 

log /i- log p) 



Other sparse matrix methods could be used here, but we do not address such 
algorithmic aspects in this paper. For particular classes of adjacency matrices, one 
can compute the similarity matrix S directly from the dominant invariant subspaces 
of matrices of the size of A or B. We provide explicit expressions for a few such classes 
in the next section. 
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4. Hubs, authorities and central scores. As explained in the introduction, 
the hub and authority scores of a graph can be expressed in terms of its adjacency 
matrix. 



Theorem 5. Let B be the adjacency matrix oj the graph Gb- The normalized 
hub and authority scores of the vertices of Gb are given by the normalized dominant 
eigenvectors of the matrices BB^ and B^ B, provided the corresponding Perron root 
is of multiplicity 1. Otherwise, it is the normalized projection of the vector 1 on the 
respective dominant invariant subspaces. 



The condition on the multiplicity of the Perron root is not superfluous. Indeed, 
even for connected graphs, BB^ and B^ B may have multiple dominant roots: for 
cycle graphs for example, both BB^ and B^ B are the identity matrix. 

Another interesting structure graph is the path graph of length three: 



1 — > 2 — » 3 



As for the hub and authority scores, we can give an explicit expression for the 
similarity score with vertex 2, a score that we will call the central score. This central 
score has been successfully used for the purpose of automatic extraction of synonyms 
in a dictionary. This application is described in more details in Section [3 

Theorem 6. Let B be the adjacency matrix of the graph Gb- The normalized 
central scores of the vertices ofGB are given by the normalized dominant eigenvector 
of the matrix B^ B + B B^ , provided the corresponding Perron root is of multiplicity 1. 
Otherwise, it is the normalized projection of the vector 1 on the dominant invariant 
subspace. 

Proof. The corresponding matrix M is as follows : 



and so 



B^B^ 








B 







M = 


B^ 





B 









B^ 







BB^ 









BB 





B^ 


B + 


BB^ 










B^B 



and the result then follows from the definition of the similarity scores, provided the 
central matrix B^B + BB^ has a dominant root of . This can be seen as 
follows. The matrix M can be permuted to 



M = P' 



E 




P, where E := 



B 

B^ 



Let now V and U be orthonormal bases for the dominant right and left singular 
subspaces of E 0^ : 



(4.1) 



EV = pU, E^U = pV, 
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then clearly V and U are also bases for the dominant invariant subspaces of E'^ E and 
EE'^, respectively, since 



E^EV = p^V, EE^U = p^U. 



Moreover, 



EE^ 
E^E 



and the projectors associated to the dominant eigenvalues of EE^ and E^E are 
respectively 11^ :— VV'^ and IIu := UU^ . The projector 11 of A/^ is then nothing 
but P^diag{Ily, n„}P and hence the sub- vectors of III are the vectors II^l and H^l, 
which can be computed from the smaller matrices E'^E or EE^ . Since E^ E = 
B + BB^ the central vector IIul is the middle vector of HI. It is worth pointing 
out that (|4.1|l also yields a relation between the two smaller projectors : 



E^ n„E, 



En„E^. 



In order to illustrate that path graphs of length 3 may have an advantage over 
the hub-authority structure graph we consider here the special case of the "directed 
bow-tie graph" Gb represented in Figure E?T1 If we label the center vertex first, then 
label the m left vertices and finally the n right vertices, the adjacency matrix for this 
graph is given by 





■ 


••• 


1 ••• 1 




1 






B = 


1 


o„ 
















. 





Orn 



The matrix B'^B + BB^ is equal to 
B'^B + BB^ = 



m + n 

In 





and, following Theorem El the Perron root of M is equal to p = \pn^pm and the 
similarity matrix is given by the (1 + to -I- n) x 3 matrix 



y^2{n + to) 






\/n + m 


" 


1 








1 














1 








1 _ 



12 V. BLONDEL, A. GAJARDO, M. HEYMANS, P. SENELLART, AND P. VAN DOOREN 




Fig. 4.1. A directed bow-tie graph. Kleinberg's hub score of the center vertex is equal to l/\/2 
if m > n and to if m < n. The central score of this vertex is equal to 1/ \/2 independently of the 
relative values of m and n. 



This result holds irrespective of the relative value of m and n. Let us call the three 
vertices of the path graph, 1, center and 3, respectively. One could view a center 
as a vertex through which much information is passed on. This similarity matrix S 
indicates that vertex 1 of Gs looks very much like a center, the left vertices of Gs 
look like I's, and the right vertices of Gs look like 3's. If on the other hand we 
analyze the graph Gb with the hub-authority structure graph of Kleinberg, then the 
similarity scores S differ for m < n and m > n : 



VmTT 



' 1 


" 

















1 


. 


1 _ 



VnTT 



' 


1 " 


1 





1 











. 


_ 



This shows a weakness of this structure graph, since the vertices of Gs that 
deserve the label of hub or authority completely change between m > n and m < n. 

5. Self-similarity matrix of a graph. When we compare two equal graphs 
Ga — Gb — G, the similarity matrix S is a square matrix whose entries are similarity 
scores between vertices of G; this matrix is the self- similarity matrix of G. Various 
graphs and their corresponding self-similarity matrices are represented in Figure ISlTl 
In general, we expect vertices to have a high similarity score with themselves; that is, 
we expect the diagonal entries of self-similarity matrices to be large. We prove in the 
next theorem that the largest entry of a self-similarity matrix always appears on the 
diagonal and that, except for trivial cases, the diagonal elements of a self-similarity 
matrix are non-zero. As can be seen from elementary examples, it is however not true 
that diagonal elements always dominate all elements on the same row and column. 



Theorem 7. The self- similarity matrix of a graph is positive semi- definite. In 
particular, the largest element of the matrix appears on diagonal, and if a diagonal 
entry is equal to zero the corresponding row and column are equal to zero. 

Proof. Since A ~ B, the iteration of the normalized matrices Zk now becomes 



AZkA^ + A^ZkA 
\AZkAT + ATZkA\\F' 
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CD — <D — -0 



0.408 
0.816 







0.408 









■ 0.250 


0.250 


0.250 


0.250 " 


0.250 


0.250 


0.250 


0.250 


0.250 


0.250 


0.250 


0.250 


. 0.250 


0.250 


0.250 


0.250 . 


■ 0.182 








" 





0.912 














0.182 


0.182 








0.182 


0.182 . 



Fig. 5.1. Graphs and their corresponding self- similarity matrix. The self- similarity matrix of 
a graph gives a measure of how similar vertices are to each other. 



Since the scaled sum of two positive semi-definite matrices is also positive semi- 
definite, it is clear that all matrices Zk will be positive semi-definite. Moreover, 
positive semi-definite matrices are a closed set and hence the limit S will also be 
positive semi- definite. The properties mentioned in the statement of the theorem are 
well known properties of positive semi-definite matrices. □ 

When vertices of a graph are similar to each other, such as in cycle graphs, we 
expect to have a self-similarity matrix with all entries equal. This is indeed the case 
as will be proved in the next section. We can also derive explicit expressions for the 
self-similarity matrices of path graphs. 

Theorem 8. The self- similarity matrix of a path graph is a diagonal matrix. 

Proof. The product of two path graphs is a disjoint union of path graphs and 
so the matrix M corresponding to this graph can be permuted to a block diagonal 
arrangement of Jacobi matrices 

r 1 1 



■•• 1 

1 _ 

of dimension j = !,...£, where £ is the dimension of the given path graph. The 
largest of these blocks corresponds to the Perron root p oi M . There is only one 
largest block and its vertices correspond to the diagonal elements of S. As shown 
in P = 2cos(7r/(^ -I- 1)) but M has both eigenvalues ±p and the corresponding 
vectors have the elements (±)^ sin(j'7r/ -I- 1)), j = 1, . . . , £, from which III can easily 
be computed. □ 

6. Similarity matrices of rank one. In this section we describe two classes of 
graphs that lead to similarity matrices that have rank one. We consider the case when 
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one of the two graphs is regular (a graph is regular if the in-degrees of its vertices are 
all equal and the out-degrees are also equal), and the case when the adjacency matrix 
of one of the graphs is normal (a matrix A is normal if it satisfies AA^ = A^A). 
In both cases we prove that the similarity matrix has rank one. Graphs that are 
not directed have a symmetric adjacency matrix and symmetric matrices are normal, 
therefore graphs that are not directed always generate similarity matrices that have 
rank one. 

Theorem 9. Let Ga,Gb be two graphs of adjacency matrices A and B and 
assume that Ga is regular. Then the similarity matrix between Ga and Gb is a rank 
one matrix of the form 

where v = III is the projection of 1 on the dominant invariant subspace of {B + B^)^ , 
and a is a scaling factor. 

Proof. It is known (see, e.g. (5) that a regular graph Ga has an adjacency 
matrix A with Perron root of algebraic multiplicity 1 and that the vector 1 is the 
corresponding Perron vector of both A and . It easily follows from this that each 
matrix Zk of the iteration defining the similarity matrix is of rank one and of the type 
Vkl'^ / y^riA, where 

Vk+i = iB + B^)vk/\\{B + B^)vk\\2, vo = 1. 

This clearly converges to ni/||ni||2 where 11 is the projector on the dominant invari- 
ant subspace of {B + B^^. □ 

Cycle graphs have an adjacency matrix A that satisfies AA^ ~ I. This property 
corresponds to the fact that, in a cycle graph, all forward-backward paths from a 
vertex return to that vertex. More generally, we consider in the next theorem graphs 
that have an adjacency matrix A that is normal, i.e., that have an adjacency matrix 
A such that AA^ = A^ A. 

Theorem 10. Let Ga and Gb be two graphs of adjacency matrices A and B 
and assume that one of the adjacency matrices is normal. Then the similarity matrix 
between Ga and Gb has rank one. 

Proof. Let A be the normal matrix and let a be its Perron root. Then there exists 
a unitary matrix U which diagonalizes both A and A'^ : 

A = UKU\ A^ = UAU* 

and the columns Ui,i = 1, . . . , ua of U are their common eigenvectors (notice that Ui 
is real only if is real as well). Therefore 

([/* ® L)M{U ® L) ^ [U* ® L){A® B + A^ ® B'^){U ® L) = K® B B'^ 

and the eigenvalues of M are those of the Hermitian matrices 

Hi :— XiB + XiB^^ , 

which obviously are bounded by \Xi\P where P is the Perron root of {B + B^). More- 
over, if Vj^^ ,j = 1, . . . , ns are the eigenvectors of Hi then those of M are given by 

UiiSivy, i^l,...nA, j = l,...,nB 
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and they can again only be real if is real. Since we want real eigenvectors cor- 
responding to extremal eigenvalues of M we only need to consider the largest real 
eigenvalues of A, i.e. ±a where a is the Perron root of A. Since A is normal we also 
have that its real eigenvectors are also eigenvectors of A'^ . Therefore 

AUa = A'^Ua = aHa, AH-a = A'^U-a = -aU-a- 

It then follows that 

{A(g,B + A'^ (g) B^)2((n+„ + n_„) ® Hp) = a^iii+„ + n_„) ® p^Up, 

and hence 11 :— (II+q, + n_a) 11/3 is the projector of the dominant root 0^0^ of M^. 
Applying this projector to the vector 1 yields the vector 

(n+„ + n_„)i«)n/3i, 

which corresponds to the rank one matrix 

s = (n+„ + n_„)in/3. 

□ 

When one of the graphs Ga or Gb is regular or has a normal adjacency matrix, 
the resulting similarity matrix S has rank one. Adjacency matrices of regular graphs 
and normal matrices have the property that the projector 11 on the invariant subspace 
corresponding to the Perron root of A is also the projector on the subspace of A^ . 
As a consequence p{A + A^) = 2p{A). In this context we formulate the following 
conjecture. 

Conjecture 11. The similarity matrix of two graphs has rank one if and only if 
one of the graph has the property that its adjacency matrix D is such that p{D+D'^) = 
2p{D). 

7. Application to automatic extraction of synonyms. We illustrate in this 
last section the use of the central similarity score introduced in Section 0| for the 
automatic extraction of synonyms from a monolingual dictionary. Our method uses a 
graph constructed from the dictionary and is based on the assumption that synonyms 
have many words in common in their definitions and appear both in the definition 
of many words. We briefly outline our method below and then discuss the results 
obtained with the Webster dictionary on four query words. This application given 
in this section is based on [S], to which we refer the interested reader for a complete 
description. 

The method is fairly simple. Starting from a dictionary, we first construct the 
associated dictionary graph G; each word of the dictionary is a vertex of the graph and 
there is an edge from m to w if w appears in the definition of u. Then, associated with 
a given query word w, we construct a neighborhood graph G^ which is the subgraph 
of G whose vertices are pointed to by w or are pointing to w (see, e.g.. Figure [7|. 
Finally, we compute the similarity score of the vertices of the graph G^ with the 
central vertex in the structure graph 

1 — >2 — » 3 

and rank the words by decreasing score. Because of the way the neighborhood graph 
is constructed, we expect the words with highest central score to be good candidates 
for synonymy. 
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Fig. 7.1. Part of the neighborhood graph associated with the word "likely". The graph contains 
all word used in the definition of "likely" and all words using "likely" in their definition. Synonyms 
are identified by selecting in this graph those vertices of highest central score. 

Before proceeding to the description of the results obtained, we briefly describe the 
dictionary graph. We used the Online Plain Text English Dictionary [2] which is based 
on the "Project Gutenberg Etext of Webster's Unabridged Dictionary" which is in 
turn based on the 1913 US Webster's Unabridged Dictionary. The dictionary consists 
of 27 HTML files (one for each letter of the alphabet, and one for several additions). 
These files are freely available from the web site http://www.gutenberg.net/ The 
resulting graph has 112, 169 vertices and 1, 398, 424 edges. It can be downloaded from 
the web-page jhttp : //www . sieves . ens . f r/home/ senellar7| 

In order to be able to evaluate the quality of our synonym extraction method, 
we have compared the results produced with three other lists of synonyms. Two of 
these (Distance and ArcRank) were compiled automatically by two other synonym 
extraction methods (see 5 for details; the method ArcRank is described in "TB"), and 
one of them lists synonyms obtained from the hand-made resource WordNct freely 
available on the WWW, ;1 . . The order of appearance of the words for this last source 
is arbitrary, whereas it is well defined for the three other methods. We have not kept 
the query word in the list of synonyms, since this has not much sense except for our 
method, where it is interesting to note that in every example we have experimented, 
the original word appears as the first word of the list; a point that tends to give credit 
to our method. We have examined the first ten results obtained on four query words 
chosen for their variety: 

1. disappear: a word with various synonyms such as vanish. 

2. parallelogram: a very specific word with no true synonyms but with some 
similar words: quadrilateral, square, rectangle, rhomb. . . 

3. sugar: a common word with different meanings (in chemistry, cooking, di- 
etetics. . . ). One can expect glucose as a candidate. 

4. science: a common and vague word. It is hard to say what to expect as 
synonym. Perhaps knowledge is the best candidate. 

In order to have an objective evaluation of the different methods, we have asked a 
sample of 21 persons to give a mark (from to 10) to the lists of synonyms, according 
to their relevance to synonymy. The lists were of course presented in random order for 
each word. The results obtained are given in the Tables FTTl 17.21 171^ and l7. 41 The last 
two lines of each of these tables gives the average mark and its standard deviation. 

Concerning disappear, the distance method and our method do pretty well; van- 
ish, cease, fade, die, pass, dissipate, faint are very relevant (one must not forget 
that verbs necessarily appear without their postposition); dissipate or faint are rel- 
evant too. Some words like light or port are completely irrelevant, but they appear 
only in 6th, 7th or 8th position. If we compare these two methods, we observe that 
our method is better: an important synonym like pass gets a good ranking, whereas 
port or appear are not in the top ten words. It is hard to explain this phenomenon, 




adapted 



probably 
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Distance 


Our method 


ArcRank 


Wordnet 


1 


vanish 


vanish 


epidemic 


vanish 


2 


wear 


pass 


disappearing 


go away 


3 


die 


die 


port 


end 


4 


sail 


wear 


dissipate 


finish 


5 


faint 


faint 


cease 


terminate 


6 


light 


fade 


eat 


cease 


7 


port 


sail 


gradually 




8 


absorb 


light 


instrumental 




9 


appear 


dissipate 


darkness 




10 


cease 


cease 


efface 




Mark 


3.6 


6.3 


1.2 


7.5 


Std dev. 


1.8 


1.7 


1.2 


1.4 



Table 7.1 
Proposed synonyms for disappear 





Distance 


Our method 


ArcRank 


Wordnet 


1 


square 


square 


quadrilateral 


quadrilateral 


2 


parallel 


rhomb 


gnomon 


quadrangle 


3 


rhomb 


parallel 


right-lined 


tetragon 


4 


prism 


figure 


rectangle 




5 


figure 


prism 


consequently 




6 


equal 


equal 


parallelepiped 




7 


quadrilateral 


opposite 


parallel 




8 


opposite 


angles 


cylinder 




9 


altitude 


quadrilateral 


popular 




10 


parallelepiped 


rectangle 


prism 




Mark 


4.6 


4.8 


3.3 


6.3 


Std dev. 


2.7 


2.5 


2.2 


2.5 



Table 7.2 
Proposed synonyms for parallelogram 



but we can say that the mutually reinforcing aspect of our method has apparently a 
positive effect. In contrast to this, ArcRank gives rather poor results with words such 
as eat, instrumental or epidemic that are not to the point. 

Because the neighborhood graph of parallelogram is rather small (30 vertices), 
the first two algorithms give similar results, which arc reasonable : square, rhomb, 
quadrilateral, rectangle, figure are rather interesting. Other words are less rel- 
evant but still are in the semantic domain of parallelogram. ArcRank which also 
works on the same subgraph docs not give results of the same quality : consequently 
and popular are clearly irrelevant, but gnomon is an interesting addition. It is inter- 
esting to note that Wordnet is here less rich because it focuses on a particular aspect 
(quadrilateral) . 

Once more, the results given by ArcRankfoi sugar are mainly irrelevant (property, 
grocer, ...). Our method is again better than the distance method: starch, sucrose, 
sweet, dextrose, glucose, lactose arc highly relevant words, even if the first given 
near-synonym (cane) is not as good. Note that our method has marks that are even 
better than those of Wordnet. 

The results for science are perhaps the most difficult to analyze. The distance 
method and ours are comparable. ArcRank gives perhaps better results than for other 
words but is still poorer than the two other methods. 

As a conclusion, the first two algorithms give interesting and relevant words, 
whereas it is clear that ArcRank is not adapted to the search for synonyms. The use of 
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Distance 


Our method 


ArcRank 


Wordnet 


1 


juice 


cane 


granulation 


sweetening 


2 


starch 


starch 


shrub 


sweetener 


2 


cane 


sucrose 


sucrose 


carbohydrate 


4 


milk 


milk 


preserve 


saccharide 


5 


molasses 


sweet 


honeyed 


organic compound 


6 


sucrose 


dextrose 


property 


saccarify 


7 


wax 


molasses 


sorghum 


sweeten 


8 


root 


juice 


grocer 


dulcify 


9 


crystalline 


glucose 


acetate 


edulcorate 


10 


confection 


lactose 


saccharine 


dulcorate 


Mark 


3.9 


6.3 


4.3 


6.2 


Std dev. 


2.0 


2.4 


2.3 


2.9 



Table 7.3 
Proposed synonyms for sugar 





Distance 


Our method 


ArcRank 


Wordnet 


1 


art 


art 


formulate 


knowledge domain 


2 


branch 


branch 


ELrithmetic 


knowledge base 


3 


nature 


law 


systematize 


discipline 


4 


law 


study 


scientific 


subject 


5 


knowledge 


practice 


knowledge 


subject area 


6 


principle 


natural 


geometry 


subject field 


7 


life 


knowledge 


philosophical 


field 


8 


natural 


learning 


learning 


field of study 


9 


electricity 


theory 


expertness 


ability 


10 


biology 


principle 


mathematics 


power 


Mark 


3.6 


4.4 


3.2 


7.1 


Std dev. 


2.0 


2.5 


2.9 


2.6 



Table 7.4 
Proposed synonyms for science 



the central score and its mutually reinforcing relationship demonstrates its superiority 
on the basic distance method, even if the difference is not obvious for all words. The 
quality of the results obtained with these different methods is still quite different to 
that of hand-made dictionaries such as Wordnet. Still, these automatic techniques 
show their interest, since they present more complete aspects of a word than hand- 
made dictionaries. They can profitably be used to broaden a topic (see the example 
of pEirallelogram) and to help with the compilation of synonyms dictionaries. 

8. Concluding remarks. In this paper, we introduce a new concept of similar- 
ity matrix and explain how to associate a score with the similarity of the vertices of 
two graphs. We show how this score can be computed and indicate how it extends 
the concept of hub and authority scores introduced by Kleinberg. We prove several 
properties and illustrate the strength and weakness of this new concept. Investiga- 
tions of properties and applications of the similarity matrix of graphs can be pursued 
in several directions. We outline some possible research directions. 

One natural extension of our concept is to consider networks rather than graphs; 
this amounts to consider adjacency matrices with arbitrary real entries and not just 
integers. The definitions and results presented in this paper use only the property 
that the adjacency matrices involved have non-negative entries, and so all results 
remain valid for networks with non-negative weights. The extension to networks 
makes a sensitivity analysis possible: How sensitive is the similarity matrix to the 
weights in the network? Experiments and qualitative arguments show that, for most 
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networks, similarity scores are alraost everywhere continuous functions of the network 
entries. Perhaps this can be analyzed for models for random graphs such as those that 
appear in '7'? These questions can probably also be related to the large literature on 
eigenvalues and invariant subspaces of graphs; see, e.g., [S], |H] and [TU). 

It appears natural to investigate the possible use of the similarity matrix of two 
graphs to detect if the graphs are isomorphic. (The membership of the graph isomor- 
phism problem to the complexity classes P or NP-complete is so far unsettled.) If two 
graphs are isomorphic, then their similarity matrix can be made symmetric by column 
(or row) permutation. It is easy to check in polynomial time if such a permutation 
is possible and if it is unique (when all entries of the similarity matrix are distinct, 
it can only be unique). In the case where no such permutation exists or when only 
one permutation is possible, one can immediately conclude by answering negatively 
or by checking the proposed permutation. In the case where many permutation ren- 
der the similarity matrix symmetric, all of them have to be checked and this leads 
to a possibly exponential number of permutations to verify. It appears interesting 
to see how this heuristic compares to other heuristics for graph isomorphism and to 
investigate if other features of the similarity matrix can be used to limit the number 
of permutations to consider. 

More specific questions on the similarity matrix also arise. One open problem is 
to characterize the pairs of matrices that give rise to a rank one similarity matrix. 
The structure of these pairs is conjectured at the end of Section 6. Is this conjecture 
correct? A long-standing graph question also arises when trying to characterize the 
graphs whose similarity matrices have only positive entries. The positive entries of 
the similarity matrix between the graphs Ga and Gb can be obtained as follows. 
First construct the product graph, symmetrize it, and then identify in the resulting 
graph the connected component (s) of largest possible Perron root. The indices of 
the vertices in that graph correspond exactly to the nonzero entries in the similarity 
matrix of Ga and Gb- The entries of the similarity matrix will thus be all positive 
if and only if the symmetrized product graph is connected; that is, if and only if, the 
product graph of Ga and Gb is weakly connected. The problem of characterizing all 
pairs of graphs that have a weakly connected product was introduced and analyzed 
in 1966 in Jl]. That reference provides sufficient conditions for the product to be 
weakly connected. Despite several subsequent contributions on this question (see, e.g. 
jl2|). the problem of efhciently characterizing all pairs of graphs that have a weakly 
connected product is a problem that, to our knowledge, is still open. 

Another topic of interest is to investigate how the concepts proposed here can be 
used, possibly in modified form, for evaluating the similarity between two graphs, for 
clustering vertices or graphs, for pattern recognition in graphs and for data mining 
purposes. 
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